AITopics | policy cover

Collaborating Authors

policy cover

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Model-Free Exploration in Low-Rank MDPs

Neural Information Processing SystemsFeb-17-2026, 06:54:43 GMT

What are the right computational primitives for exploration?

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Efficient Model-Free Exploration in Low-Rank MDPs

Neural Information Processing SystemsOct-9-2025, 08:18:06 GMT

What are the right computational primitives for exploration?

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Necessary and Sufficient Oracles: Toward a Computational Taxonomy For Reinforcement Learning

Rohatgi, Dhruv, Foster, Dylan J.

arXiv.org Artificial IntelligenceFeb-12-2025

Algorithms for reinforcement learning (RL) in large state spaces crucially rely on supervised learning subroutines to estimate objects such as value functions or transition probabilities. Since only the simplest supervised learning problems can be solved provably and efficiently, practical performance of an RL algorithm depends on which of these supervised learning "oracles" it assumes access to (and how they are implemented). But which oracles are better or worse? Is there a minimal oracle? In this work, we clarify the impact of the choice of supervised learning oracle on the computational complexity of RL, as quantified by the oracle strength. First, for the task of reward-free exploration in Block MDPs in the standard episodic access model -- a ubiquitous setting for RL with function approximation -- we identify two-context regression as a minimal oracle, i.e. an oracle that is both necessary and sufficient (under a mild regularity assumption). Second, we identify one-context regression as a near-minimal oracle in the stronger reset access model, establishing a provable computational benefit of resets in the process. Third, we broaden our focus to Low-Rank MDPs, where we give cryptographic evidence that the analogous oracle from the Block MDP setting is insufficient.

machine learning, regression, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2502.08632

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.63)
Workflow (0.46)

Industry: Education (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.45)

Add feedback

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

Golowich, Noah, Moitra, Ankur, Rohatgi, Dhruv

arXiv.org Machine LearningSep-18-2023

The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a $k$-sparse linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing all the relevant features, and the goal is to learn a near-optimal policy in only poly$(k,\log d)$ interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems. Along the way we introduce the notion of an emulator: a succinct approximate representation of the transitions that suffices for computing certain Bellman backups. Since linear MDPs are a non-parametric model, it is not even obvious whether polynomial-sized emulators exist. We show that they do exist and can be computed efficiently via convex programming. As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

2309.09457

Country:

North America > United States (0.67)
Europe (0.14)

Genre: Workflow (0.93)

Industry:

Energy > Oil & Gas (0.67)
Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Efficient Model-Free Exploration in Low-Rank MDPs

Mhammedi, Zakaria, Block, Adam, Foster, Dylan J., Rakhlin, Alexander

arXiv.org Artificial IntelligenceJul-8-2023

A major challenge in reinforcement learning is to develop practical, sample-efficient algorithms for exploration in high-dimensional domains where generalization and function approximation is required. Low-Rank Markov Decision Processes -- where transition probabilities admit a low-rank factorization based on an unknown feature embedding -- offer a simple, yet expressive framework for RL with function approximation, but existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions such as latent variable structure, access to model-based function approximation, or reachability. In this work, we propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs that is both computationally efficient and model-free, allowing for general function approximation and requiring no additional structural assumptions. Our algorithm, VoX, uses the notion of a generalized optimal design for the feature embedding as an efficiently computable basis for exploration, performing efficient optimal design computation by interleaving representation learning and policy optimization. Our analysis -- which is appealingly simple and modular -- carefully combines several techniques, including a new reduction from optimal design computation to policy optimization based on the Frank-Wolfe method, and an improved analysis of a certain minimax representation learning objective found in prior work.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2307.03997

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Florida > Orange County > Orlando (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

Mhammedi, Zakaria, Foster, Dylan J., Rakhlin, Alexander

arXiv.org Artificial IntelligenceApr-12-2023

We study the design of sample-efficient algorithms for reinforcement learning in the presence of rich, high-dimensional observations, formalized via the Block MDP problem. Existing algorithms suffer from either 1) computational intractability, 2) strong statistical assumptions that are not necessarily satisfied in practice, or 3) suboptimal sample complexity. We address these issues by providing the first computationally efficient algorithm that attains rate-optimal sample complexity with respect to the desired accuracy level, with minimal statistical assumptions. Our algorithm, MusIK, combines systematic exploration with representation learning based on multi-step inverse kinematics, a learning objective in which the aim is to predict the learner's own action from the current observation and observations in the (potentially distant) future. MusIK is simple and flexible, and can efficiently take advantage of general-purpose function approximation. Our analysis leverages several new techniques tailored to non-optimistic exploration algorithms, which we anticipate will find broader use.

artificial intelligence, lemma 4, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2304.05889

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Agarwal, Alekh, Henaff, Mikael, Kakade, Sham, Sun, Wen

arXiv.org Artificial IntelligenceAug-13-2020

Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of interest, and they allow for richly parameterized policies. Their primary drawback is that, by being local in nature, they fail to adequately explore the environment. In contrast, while model-based approaches and Q-learning directly handle exploration through the use of optimism, their ability to handle model misspecification and function approximation is far less evident. This work introduces the the Policy Cover-Policy Gradient (PC-PG) algorithm, which provably balances the exploration vs. exploitation tradeoff using an ensemble of learned policies (the policy cover). PC-PG enjoys polynomial sample complexity and run time for both tabular MDPs and, more generally, linear MDPs in an infinite dimensional RKHS. Furthermore, PC-PG also has strong guarantees under model misspecification that go beyond the standard worst case $\ell_{\infty}$ assumptions; this includes approximation guarantees for state aggregation under an average case error assumption, along with guarantees under a more general assumption where the approximation error under distribution shift is controlled. We complement the theory with empirical evaluation across a variety of domains in both reward-free and reward-driven settings.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2007.08459

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey (0.04)
Europe > Italy > Sardinia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Kinematic State Abstraction and Provably Efficient Rich-Observation Reinforcement Learning

Misra, Dipendra, Henaff, Mikael, Krishnamurthy, Akshay, Langford, John

arXiv.org Machine LearningNov-13-2019

We present an algorithm, HOMER, for exploration and reinforcement learning in rich observation environments that are summarizable by an unknown latent state space. The algorithm interleaves representation learning to identify a new notion of kinematic state abstraction with strategic exploration to reach new states using the learned abstraction. The algorithm provably explores the environment with sample complexity scaling polynomially in the number of latent states and the time horizon, and, crucially, with no dependence on the size of the observation space, which could be infinitely large. This exploration guarantee further enables sample-efficient global policy optimization for any reward function. On the computational side, we show that the algorithm can be implemented efficiently whenever certain supervised learning problems are tractable. Empirically, we evaluate HOMER on a challenging exploration problem, where we show that the algorithm is exponentially more sample efficient than standard reinforcement learning baselines.

abstraction, policy cover, probability, (16 more...)

arXiv.org Machine Learning

1911.05815

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Illinois (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Provably efficient RL with Rich Observations via Latent State Decoding

Du, Simon S., Krishnamurthy, Akshay, Jiang, Nan, Agarwal, Alekh, Dudík, Miroslav, Langford, John

arXiv.org Machine LearningJan-25-2019

We study the exploration problem in episodic MDPs with rich observations generated from a small number of latent states. Under certain identifiability assumptions, we demonstrate how to estimate a mapping from the observations to latent states inductively through a sequence of regression and clustering steps---where previously decoded latent states provide labels for later regression problems---and use it to construct good exploration policies. We provide finite-sample guarantees on the quality of the learned state decoding function and exploration policies, and complement our theory with an empirical evaluation on a class of hard exploration problems. Our method exponentially improves over $Q$-learning with na\"ive exploration, even when $Q$-learning has cheating access to latent states.

probability, provably efficient rl, rich observation, (14 more...)

arXiv.org Machine Learning

1901.09018

Country:

North America > United States > New York (0.04)
North America > United States > Illinois (0.04)
Europe > Sweden > Skåne County > Malmö (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(2 more...)

Add feedback